Raise from disconnect error in xopen #5382

lhoestq · 2022-12-20T15:52:44Z

this way we can know the cause of the disconnect

related to #5374

HuggingFaceDocBuilderDev · 2022-12-20T15:57:33Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq · 2023-01-25T15:47:23Z

Could you review this small PR @albertvillanova ? :)

albertvillanova

Thank you.

I was wondering if better using _retry function, but it is OK...

github-actions · 2023-01-26T09:51:12Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.011200 / 0.011353 (-0.000153)	0.006156 / 0.011008 (-0.004852)	0.119072 / 0.038508 (0.080564)	0.042616 / 0.023109 (0.019507)	0.348329 / 0.275898 (0.072431)	0.418550 / 0.323480 (0.095070)	0.009302 / 0.007986 (0.001316)	0.004596 / 0.004328 (0.000267)	0.090111 / 0.004250 (0.085860)	0.053341 / 0.037052 (0.016289)	0.361234 / 0.258489 (0.102745)	0.400427 / 0.293841 (0.106586)	0.045601 / 0.128546 (-0.082945)	0.013806 / 0.075646 (-0.061841)	0.393178 / 0.419271 (-0.026094)	0.056809 / 0.043533 (0.013276)	0.344090 / 0.255139 (0.088951)	0.370610 / 0.283200 (0.087410)	0.125728 / 0.141683 (-0.015955)	1.671931 / 1.452155 (0.219776)	1.703143 / 1.492716 (0.210427)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.226534 / 0.018006 (0.208527)	0.496487 / 0.000490 (0.495998)	0.002235 / 0.000200 (0.002035)	0.000094 / 0.000054 (0.000039)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.031298 / 0.037411 (-0.006113)	0.137740 / 0.014526 (0.123214)	0.153497 / 0.176557 (-0.023059)	0.204201 / 0.737135 (-0.532934)	0.162324 / 0.296338 (-0.134014)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.475922 / 0.215209 (0.260712)	4.682344 / 2.077655 (2.604689)	2.107387 / 1.504120 (0.603267)	1.884792 / 1.541195 (0.343597)	2.003180 / 1.468490 (0.534690)	0.810212 / 4.584777 (-3.774564)	4.631047 / 3.745712 (0.885334)	4.467606 / 5.269862 (-0.802256)	2.334196 / 4.565676 (-2.231480)	0.099713 / 0.424275 (-0.324562)	0.014732 / 0.007607 (0.007125)	0.604587 / 0.226044 (0.378543)	5.951679 / 2.268929 (3.682751)	2.704761 / 55.444624 (-52.739863)	2.280695 / 6.876477 (-4.595781)	2.279489 / 2.142072 (0.137417)	0.962474 / 4.805227 (-3.842753)	0.195279 / 6.500664 (-6.305385)	0.071503 / 0.075469 (-0.003966)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.558037 / 1.841788 (-0.283751)	17.722140 / 8.074308 (9.647832)	16.229016 / 10.191392 (6.037624)	0.177148 / 0.680424 (-0.503276)	0.034162 / 0.534201 (-0.500039)	0.513945 / 0.579283 (-0.065338)	0.533542 / 0.434364 (0.099178)	0.672457 / 0.540337 (0.132119)	0.762390 / 1.386936 (-0.624546)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009739 / 0.011353 (-0.001613)	0.006095 / 0.011008 (-0.004914)	0.105968 / 0.038508 (0.067460)	0.046229 / 0.023109 (0.023120)	0.449156 / 0.275898 (0.173258)	0.462182 / 0.323480 (0.138702)	0.006981 / 0.007986 (-0.001004)	0.004867 / 0.004328 (0.000539)	0.082142 / 0.004250 (0.077891)	0.058652 / 0.037052 (0.021600)	0.454542 / 0.258489 (0.196052)	0.494910 / 0.293841 (0.201069)	0.047159 / 0.128546 (-0.081387)	0.014677 / 0.075646 (-0.060969)	0.370819 / 0.419271 (-0.048452)	0.064603 / 0.043533 (0.021070)	0.441514 / 0.255139 (0.186375)	0.442802 / 0.283200 (0.159603)	0.138603 / 0.141683 (-0.003080)	1.692810 / 1.452155 (0.240655)	1.894596 / 1.492716 (0.401880)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.281681 / 0.018006 (0.263675)	0.532693 / 0.000490 (0.532203)	0.005484 / 0.000200 (0.005284)	0.000156 / 0.000054 (0.000102)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.032994 / 0.037411 (-0.004417)	0.134614 / 0.014526 (0.120088)	0.142286 / 0.176557 (-0.034270)	0.187220 / 0.737135 (-0.549916)	0.144897 / 0.296338 (-0.151441)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.519536 / 0.215209 (0.304327)	5.214429 / 2.077655 (3.136775)	2.612575 / 1.504120 (1.108455)	2.369085 / 1.541195 (0.827891)	2.503157 / 1.468490 (1.034667)	0.834827 / 4.584777 (-3.749950)	4.586789 / 3.745712 (0.841077)	4.472605 / 5.269862 (-0.797257)	2.314471 / 4.565676 (-2.251205)	0.095817 / 0.424275 (-0.328458)	0.014086 / 0.007607 (0.006478)	0.605875 / 0.226044 (0.379831)	6.153143 / 2.268929 (3.884214)	3.187456 / 55.444624 (-52.257169)	2.755377 / 6.876477 (-4.121100)	2.777118 / 2.142072 (0.635046)	0.967285 / 4.805227 (-3.837942)	0.199202 / 6.500664 (-6.301462)	0.075979 / 0.075469 (0.000510)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.481758 / 1.841788 (-0.360030)	18.053769 / 8.074308 (9.979461)	15.558780 / 10.191392 (5.367388)	0.226135 / 0.680424 (-0.454288)	0.021668 / 0.534201 (-0.512533)	0.562618 / 0.579283 (-0.016666)	0.518183 / 0.434364 (0.083819)	0.628580 / 0.540337 (0.088243)	0.740368 / 1.386936 (-0.646568)

raise disconnect error

22be20e

lhoestq requested a review from albertvillanova December 20, 2022 15:52

albertvillanova approved these changes Jan 26, 2023

View reviewed changes

lhoestq merged commit 4e4d46e into main Jan 26, 2023

lhoestq deleted the raise-err-when-disconnect branch January 26, 2023 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise from disconnect error in xopen #5382

Raise from disconnect error in xopen #5382

lhoestq commented Dec 20, 2022

HuggingFaceDocBuilderDev commented Dec 20, 2022 •

edited

Loading

lhoestq commented Jan 25, 2023

albertvillanova left a comment

github-actions bot commented Jan 26, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Raise from disconnect error in xopen #5382

Raise from disconnect error in xopen #5382

Conversation

lhoestq commented Dec 20, 2022

HuggingFaceDocBuilderDev commented Dec 20, 2022 • edited Loading

lhoestq commented Jan 25, 2023

albertvillanova left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

HuggingFaceDocBuilderDev commented Dec 20, 2022 •

edited

Loading